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We describe an efficient algorithm for determining exactly the minimum number of sires consistent with the 
multi-locus ge notypes of a mother and her progeny. We consider cases where a simple exhaustive search through 
all possible sets of sires is impossible in practice (because it would take too long to complete). Our algorithm 
... for solving this combinatorial optimisation problem avoids visiting large parts of search space which would not 

' improve the solution found so far (i.e., result in a solution with fewer number of sires). This is of particular 

C D ' importance when the number of allelic types in the progeny array is large and when the minimum number of sires 

C " ' is expected to be large. Precisely in such cases it is important to know the minimum number of sires: this number 

^SJ , gives an exact bound on the most likely number of sires estimated by a random seai'ch algorithm in a pai'ameter 

region where it may be difficult to determine whether it has converged. We apply our algorithm to data from the 
marine snail, Littorina saxatilis. 
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I. INTRODUCTION 



(N 

■ A number of species from different taxa are known to mate numerous times during a mating season. Females of such 
' species are likely to give birth to offspring fathered by more than one sire, and in some cases, the offspring may be fa- 
O . thered by a large numbe r of sires. For examp le, queens of the honeybee are known to leave the nest followed by hundreds 
C\ ' of males (f Wattanachaiyi n^haroen et a/.Ll2003l) . Females of the saltwater fly mate many times a day during the mating season 
■ ■ (Blyth and Gilbourn, 2006) and in species of periwinkles (marine gastropods) females mate repeatedly during the mating season 
which is several months long (Reid. 1996; Saur. 1990(). 

The degree of multiple paternity can be inferred from empirical data, obtained for example by genotyping females and off- 
spring using high-resolution genetic markers, such as microsatellites or single nucleotide polymorphisms. To estimate the num- 
ber of sires corresponding to a given multi-locus data set is usually straightforward when the number of offspring, the number 
of their allelic types, and the number of sires to be determined are small. 

In the examples mentioned above, however, the levels of multiple paternity are often so high that the mathematical analysis 
of the empirical data becomes very complex and time-consuming. In this paper we describe a new efficient algorithm to analyse 
multiple paternity in empirical data sets when the paternal genotypes are unknown. We apply the algorithm to data sets from the 
' ' periwinkle species Littorina saxatilis exhibitin g high levels of rnultip le paternity [it has been shown previously that at least 7-10 

males are fathers to the offspring of one brood ( Makinen et a/.Ll2b07h l. 
(<~^ Consider an example. Tab. U shows the multi-locus genotypes of 42 progeny from a brood of a female periwinkle. How 

• • does one determine the number of sires from the data shown in Tab. |T]? More precisely, this question may be posed in at least 
. ^ three different ways: what is the actual number of sires, as opposed to the most likely number, or the minimum number of sires 
consistent with the mother and her progeny array? 

Unless one is able to directly observe the matings it is in general not possible to determine the actual number of sires for an 
5^ array such as the one shown in Tab.|T] An alter native is to i nstead determine the most likely number of sires consistent with the 
multi-locus genotypes of mother and progeny ('Wang', '2004). In this approach it is commonly assumed that the population is in 
Hardy-Weinberg equilibrium, that all loci are subject to neutral evolution , and that all loci are in pairwise linkage equilibrium. 
The most likely set of sires is determined by a random search algorithm dPress et al. I ll986l) . locally maximising the likelihood 
constrained by requiring consistency with mother and progeny. A random search however does not guarantee convergence. The 
only way to make sure that the algorithm has converged is to compare with an exhaustive search. 

Ex haustive search algorithms have been published in the literature. An example available for download is GERUD I Jonesl 
I2OO1I i2005). The exhaustive search is commonly conducted as follows: a list of paternal alleles at all loci is determined from 
the alleles in the progeny array, subtracting those of the mother If for a given child at a given locus the paternal alleles cannot 
be uniquely extracted, both are kept in the list. From this list, a set of potential sires is obtained by constructing all possible 
multi-locus paternal haplotypes. This list is pruned by removing individuals which are not consistent with any of the progeny in 
the data. The minimum number of sires is determined by exhaustively searching this pruned set: first the algorithm tests whether 
a single father from this set is consistent with the progeny array. If this is not the case, all pairs of sires are tested, if necessary 
all triplets of sires, and so forth. This algorithm ensures that the minimum number of sires consistent with the data are found. 
This algorithm has been successfully used in many circumstances. It works well when the list of paternal alleles is not too long, 
and when the minimum number of fathers to be determined is not too large. 
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The minimum number of sires for the data sets shown in Tab. Uis found to be twelve (using the algorithm described below). 
With the search algorithm summarised above, one would have to go through a prohibitively large number of sets of possible 
sires. Since the number of sets to sea rch typically increases combinatorially with the number of sires, the exhaustive search in the 
algorithm proposed by ('J onesll2005h is limited to maxim ally six sires for a given sample. In practice, when the number of sires is 
five or six, the algorithm may take very long to converge dSefc et al. 2008h. In many practical applications ( Amavet et al. 2008 ; 



Portnov et al. 1 120071: iRispoli and WilsonL 120081: LSimmons et al\. 120081: ISong et al\. 120071: iTakagi ef al\. \mm Van Dornik et al. 



20081) the number of sires determined with GERUD does not exceed four (It should be noted that GERUD 2.0 allows the user 



to truncate the pruned set of sires in an ad-hoc fashion in order to check for up to eight sires. But due to the truncation, the 
minimum number of sires is potentially overestimated. By how much is unclear.) 

A maximum likelihood approach indicates that the number of sires in samples of L. saxatilis (iMakinen et a/.l 120071) is typically 
larger than six. We have therefore developed a new algorithm for determining the minimum number of sires consistent with a 
given progeny array, such as that shown in Tab. J] The algorithm is described in Sec. It makes it possible to exactly determine 
the minimum number of sires for the empirical data sets listed in Tab. |ll] in seven of the nine data sets the minimum number 
of sires is found to be larger than six. Further, the new algorithm enables us to estimate the number of sires for a data set 
when convergence of maximum-likelihood algorithms is difficult to ascertain. Fig.[3]for example shows a run of the maximum- 
likelihood algorithm COLONY (Wang. 2004.) (which estimates the most likely number of fathers given the population allele 
frequencies) for the data shown in Tab.|l] compared to the minimum number of sires (which is twelve for this data set). Fig.|2] 
raises the question how likely it is that the minimum number of sires is equal to the actual number of sires for a given sample. 
Using our algorithm we have investigated this question employing data sets generated by a coalescent algorithm within a step- 
wise mutation model: the answer depends upon the properties of the population in question. In cases where the probability that 
the two numbers are the same is high, we can infer that the actual nmber of sires in this situation can be reliably estimated from 
empirical samples (it should be emphasized that the result of our algorithm always is an exact lower bound). Last but not least, 
we employ our algorithm to answer the following question: given an empirical data set, how could one most efficiently increase 
the accuracy of the estimate of the number of sires: by genotyping more loci for a given set of progeny, or by genotyping more 
progeny for a given number of loci? Again, the answer depends upon the properties of the population in question. 

The remainder of this article is organised as follows. In Sec. |II]we describe the new algorithm, called 'MinFathers'. We 
also briefly describe how we produced artificial samples using the coalescent in order to test the new algorithm. Our results are 
summarised in Sec. |III] Conclusions are drawn in Sec.lIVI 



II. METHODS 

A. An efficient search algorithm 

In this section we describe our new algorithm which determines the minimum number of sires consistent with a given progeny 
array, such as that shown in Tab. U 

To find the minimal set of sires is equivalent to finding a partition of the progeny, such that all progeny in a given member of 
the partition can be inherited from a single father. A father is represented by a list of the two alleles at each locus. Each paternal 
allele may either have a definite value, or no value may yet have been assigned to this allele. Each progeny too is represented 
by a list of alleles, one for each locus when the allele of the mother has been subtracted. Usually, the allelic types are uniquely 
determined. There are, however, two exceptions: when a genotype error has occurred, and when the mother and the offspring 
are identical and heterozygous. In this case, at the locus in question, there are two possible alleles for the progeny. We ignore 
these complications for the moment, and return to them later, after having described the algorithm in its simplest form. 

In our algorithm, the most general common father for a set of progeny is found through a sequence of merging operations. 
This operation maps the two most general fathers of two sets of progeny to the most general father of the combined set. Since 
we are always searching for minimum number of fathers, the fathers are always taken to be heterozygous at each locus (or some 
loci remain undetermined). 

The merging of two fathers / and /' proceeds independently at each locus (since free recombination is assumed). Assume that 
a common father / for a set of j progeny has been found. Now add another progeny to this set. Assume that the new progeny 
has allelic type a at a given locus. Its most general father has the genetic configuration {a, 0} at this locus. The most general 
common father /' of the joint set of j + 1 progeny is obtained by merging / and {a, 0} as described in Tab. Hill At a given locus, 
the father /' may have several possible configurations, depending on the configurations of / and the father of p. In the table, the 
asterisk denotes that the corresponding allelic type has not yet been determined, or is unknown. The most general father of a 
single progeny is {a, *}. 

Now consider a partition F of j progeny. We introduce the following terminology. A partition is called valid if for each 
element of the partition there is a common father for all progeny in the element. In other words, a valid partition of progeny 
corresponds to a set of fathers for the progeny in question. Our algorithm can now be formulated as follows: for each valid 
partition F of progeny 1, . . . , j, generate all valid partitions F' of progeny 1, . . . , j + 1 by adding a new progeny j + 1 to each 
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element of F provided the new father merges with the common father of the element of F into a valid common father. Starting 
from an empty set of fathers, F = 0, and a set of progeny S, we can find all valid sets of fathers of S by this algorithm. 

It is possible to find the minimum number of fathers from this method by taking the minimum modulus of all partitions 
found. This is usually much better than first generating the full set of partitions of the progeny, and subsequently checking which 
partitions are valid; how much more efficient this is depends on the data. Our algorithm for finding the minimum number of 
fathers is summarised in Fig.[T] It recursively builds all valid partitions of a set of progeny 5, except some partitions that can be 
shown to not correspond to the minimum number of fathers. In Fig.|2]we show a search tree for a simple set of four progeny. 
Each progeny has two loci, and at each locus the allele corresponding to the mother has been subtracted, so that each progeny is 
described by a list of two alleles (one for each locus): pi = (a, h), p2 — (c, d), = (c, e), and p^ = (a, d). It is assumed that 
a, b, c, and d are four different allelic types. 

We conclude this section by emphasizing four important points. First, note that if we have a given number of fathers for 
progeny 1, . . . ,j, the number of fathers for progeny 1, . . . , j + 1 must be at least as high. Hence, whenever the set of fathers is 
at least as large as the minimum number found so far, we can stop searching for partitions of S based on the current partition. 
When we have a complete partition of S which is smaller than the minimum found so far, we can update the minimum. As 
a starting point, we can use any valid upper bound; in the present implementation we use the trivial bound that the minimum 
number of fathers is n* < [|S'|/2] . Using our algorithm, we have 

n* =MinFathers(0,S', [|S'|/2]). (1) 

Second, the algorithm as described above is valid only if the genetic material inherited from the father at each locus is 
uniquely determined. In general, this is not the case, as pointed out above. Instead, there may be one or more loci with two 
possible choices. In this case, when generating the valid partitions containing a given progeny we loop over all possible variants 
of alleles in these loci. In practical data, it is rare to have more than one such locus, but if many loci are considered this could 
cause problems: the number of variants of the allele that may need to be considered is 2"\ where m is the number of loci with 
multiple choices in the progeny. For data sets where this is a problem, it is possible to extend the algorithm to a more complex 
merging operation, where for each locus of a father, we keep track of all possible pairs of alleles that can simultaneously match 
all progeny deriving from the father 

Third, if we find that some of the progeny not yet included in the current set of fathers can be directly inherited from any of 
the fathers (i.e. the father contains the necessary genetic material at all loci), we can safely remove them from from the set of 
progeny. In other words, if a new progeny p can be directly inherited from a father /, a merge /' between / and p will lead 
to /' = /, i.e. no change in /. Hence, it is clear that given the present set of fathers, it is not possible to find another way of 
merging these progeny which will lead to fewer fathers for the whole set. In our algorithm, whenever we add a new father to 
the partition, we remove the progeny that can be directly inherited from the new father, before recursively searching for new 
partitions. 

Fourth, in general the key to an efficient solution of a combinatorial problem such as the present one lies in cutting away 
as large parts of the search space as early as possible. In the present context, this means that if we can consider the most 
constraining progeny first, we can discard partitions that will not be valid for the whole set of progeny, or that will be larger 
than the minimum, at an early stage. We sort the progeny first with respect to the number of undetermined symbols (if any), 
and where then number of no-care symbols is equal, with respect to the number of multiple choice loci in the progeny, so that 
individuals with multiple-choice or undetermined loci will be considered first. Before the sorting, we check the multiple choice 
loci. Consider a multiple choice locus in a given progeny. If the two alleles only occur together at that locus in all progeny, 
or if only one of the loci occur alone or together with some third allele, we can safely replace the multiple choice by the most 
frequent of the allelic types. Picking the least frequent allelic type can only exclude some possible merges that would have been 
possible with the more common allele. However, if both alleles occur also alone or with some third allelic type, we cannot safely 
conclude which choice will lead to the minimum number of fathers. In this case it is necessary to keep both options. 

B. Generating samples using the coalescent 

In this section we describe how we have used the coalescent to generate artificial samples in order to test the search algorithm 
describe in the previous section. 

In order to understand under which circumstances the minimum number of fathers is equal to the true number of fathers, we 
generate rip progeny with a known number of fathers, Uf, as follows. First, the gene genealogies of the L loci in a mother and 
nf fathers are generated according to the standard neutral coalescent theory for an unstructured population with constant size 
A'^. Because we assume that the loci are unlinked, the gene genealogies of different loci are statistically independent. In each 
branch of the genealogies, mutations occur with probability /i per generation, so that the number of mutations in a branch of 
T generations is Poisson distributed with mean /iT. In the coalescent, time is measured in units of 2N generations and the 
mutation rate is given by the scaled parameter 6 = AN ji. 
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We model microsatellite d ata using t he stepwise mutation model, where each mutation leads to either the gain or the loss of 
a single repeat unit (.Kimura and Ohtal 1 19751 Il978l) . Thus, given the genealogy, for each locus we start from the most recent 
common ancestor of the whole sample, and assign it allele (in the stepwise mutation number, only differences in the number 
of repeat units are relevant). We then recursively generate the alleles of each node in the genealogy by generating the stepwise 
mutations along each branch as described above, until we have assigned the alleles for all individuals in the sample. 

Given the allelic types of the mother and the fathers, we produce the offspring as follows: for each progeny we pick a randomly 
chosen father, such that each father is equally likely to be picked. If in the end not all fathers have been picked for at least some 
offspring, we repeat the whole process until this is the case. This guarantees that the true number of fathers is exactly Uf. For 
each locus in the progeny we then form the progeny according to Mendelian inheritance from the mother and the father, by 
picking one allele from the mother and one from the father This procedure guarantees that the marginal distribution of the 
number of offspring per father is approximately binomial, which is consistent with the neutral theory and with empirical data for 
the snails (iMiikinen ef a/.ll2007h . 

The performance of 'MinFathers' depends on the amount of genetic variation (determined by the mutation rate 9) and on the 
number of progeny per father, rip/f = n-p/nf. When is small, the minimum number of fathers is small and the algorithm 
terminates quickly. Also when 9 is large the algorithm is efficient because it can usually eliminate impossible merges at an early 
stage. When 9 is intermediate and rip/j is large, however, the algorithm may have to investigate a significant fraction of the 
possible combinations, and in these cases it may not be practical to use the algorithm. Despite this caveat, Tab.HJand the results 
present in the next section show that the algorithm can be used on empirical data with a large number of progeny and across a 
large range of pai^ameters for 9 and n^/i- 

III. RESULTS 

In this section we describe the results obtained with the new algorithm proposed in section [III 

A. Application to empirical data 

We have applied o ur algorithm to the L. saxatilis data by dMakinen et al 1 l2007l) . and to a new L. saxatilis data set 
teostrom et al. 1 120091 in preparation), given in Tab. HI We begin by describing our results for the new data set (Tab. |l|l. The 
original data contains more progeny than listed in Tab. H] Using our algorithm we find that the minimum number of sires is 
twelve, as given in the first row of Tab. [Ill The algorithm GERUD 2 .0 cou l d not be run because this algorithm determines the 
exact solution only for up to six sires. We have also run COLONY (IWangl |2004|) (which estimates the most likely number of 
sires), and the corresponding results are shown in Fig. [3] It is not entirely clear whether the algorithm has converged; the log 
likelihood may have reached a plateau but it is not clear whether further exploration of the state-space may yield still higher 
likelihood values. It is therefore valuable to have the exact lower bound (al so shown in Fig.[3]l fro m our new algorithm. 

Using the new algorithm, MinFathers, we have re-analysed the data of dMakinen et a l. ','2007'); the corresponding results are 
also given in Tab. The last eight data sets in Tab. HIl were analysed using GERUD by (Makinen et al., 2007) and are broadly 
consistent with the corresponding results of COLONY. It must be noted however that in dMakinen et ali 120071) . the search was 
performed on three loci only, and using an ad hoc truncation of the possible set of fathers. It is therefore of interest to determine 
what the minimum number of sires actually is. The c orresponding results ar e shown in Tab. [Ill and provide an exact lower bound 
for the most likely number of fathers determined by (Ma kinen et a/.l 1200 7^. Except in two cases, GERUD 2.0 could not be run 
because the true minimum number of fathers exceeded the maximum value of six. 

The results summarised in Tab. |ll]raise the question of how much larger than the exact minimum one expects the most likely 
number of fathers to be. The answer depends upon the population model, and upon the parameters describing it, such as the 
mutation rate 9, the number L of loci, and the number n of progeny in the data. This question is addressed in the following 
section. 

B. The difference between the minimum and the most likely number of sires 

In this section we consider a population in Hardy-Weinberg equilibrium, we assume that all loci are subject to neutral evolu- 
tion, and that all loci are in pairwise linkage equilibrium. We pose the question: how much larger than the minimum number is 
the most likely number of sires in a brood of a given mother? To this end we generated samples using the coalescent as described 
in Sec. HLB] 

Figs. |4] and |5] summarise our results. First, Fig. |4] shows the probability that the number of fathers is equal to the minimum 
number of fathers, Psame, as a function of the number rip/f of progeny per father, for three values of the mutation rate 9; 9 = 1, 
9 — 10, and 9 — 100. When 9 is large, we observe a sharp transition where Psame increases from almost zero to almost unity. 
When 9 is small, however, sampling many offspring does not result in a signficiant increase of Psame because the fathers are too 
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genetically similar. For intermediate values of 6, Psame does increase with the number of progeny per father, but never reaches 
unity because the fathers are still significantly genetically correlated. 

Second, Fig. |5]shows how Psame changes as a function of 6 for five different values of the number of loci L. When i = 1, the 
minimum number of fathers is always smaller than the true number of fathers (unless there is only one father), therefore Psamc 
equals zero. For L > 2 we find that Psame increases as a function of both 6 and L. The extent to which the minimum number of 
fathers agrees with the true number of fathers depends on the probability that the fathers are all genetically distinct. When L is 
increasing, this happens for smaller values of 6, as indicated by the increasingly sharp transitions frompsame = to psame = 1 in 
the figure. 

Thus, when the mutation rate, number of loci, and number of offspring per father are sufficiently high so that Psamc ~ 1, the 
minimum number of fathers almost always equals the true number of fathers. As a consequence, the resulting number of sires 
contributing to a given family does not depend on whether the population is structured or panmitic. In other words, the result is 
insensitive to assumptions about the underlying population structure. 



IV. DISCUSSION 

We conclude with a discussion of, first, how the minimum number of fathers is influenced by genotyping errors. Second, we 
address the following question. If the aim is to increase the probability of deducing the true number of fathers from empirical 
data such as Tab. U is it better of to sample more offspring or more loci? 

Microsatellites can be prone to genotyping errors (see reviews by iDeWoodv et al.l l2006t iHoffman and Amosl 120051; 
ISelkoe and Tooneni |2006|) . Sources of error when genotyping microsatellites include stuttering (appearance of PCR products 
one or more repeats shorter than an actual allele), allele dropout (non-amplification of one of the two alleles in a heterozygote, 
usually a longer one), non-specific PCR-products due to annealing of primers to multiple sites, null alleles (alleles with a mu- 
tation in the primer region, which prevent their amplification in PCR) and, finally, mistyping and other mistakes during manual 
scoring of the results. 

Changes to the number of repeats (PCR stuttering events) may cause the minimum number of fathers to appear larger than it 
actually is. PCR stuttering events in a microsatellite locus are usually incremental (single additions or deletions of a repeat unit). 
Hence, when the sample size is large, it is likely that the resulting alleles are present in other fathers (the effect of PCR stuttering 
is similar to mutations occurring during meiosis). Therefore, when the frequency of stuttering events is small, the effect on the 
minimum number of fathers is expected to be small. 

When only a single allele is amplified (e.g. because of allele dropout or null alleles), the minimum number of fathers of the 
sample may in- or decrease. If the frequency of such errors is small, however, the effect is expected to be small: assuming that 
the incidence of these errors are independent across different loci, the likelihood that the errors change the minimum number of 
fathers decreases rapidly with increasing number of loci. 

We now turn to the question of how to best increase the accuracy of estimating the true number of fathers. When the number 
of offspring is large, as in the marine snails, there are two possibilities for increasing the probability that we can deduce the true 
number of fathers. One may either sample the same loci in more offspring, or one may sample more loci in the offspring we 
already have. Which is the better option? Our results show that increasing the number of loci generally provides the quickest 
way of increasing the probability of finding the true number of fathers from the minimum number of fathers, but whether this is 
feasible or not depends on the availability and cost of additional high-quality markers (see Figs. |4] and |5]). It may be less costly 
to sample more individuals with fewer loci, if possible. In this case, however, the accuracy may be limited by the number of 
offspring available but also the genetic variation in the loci. If few loci are sampled, and the mutation rate is low, our results show 
that it may not be possible to increase the accuracy by sampling more individuals beyond a certain limit, which is determined by 
the probability that the fathers share the same alleles. 

In order to relate the theoretical discussion of Sec. 3.2 to the empirical data (Tabs.HlandHIb. we have estimated the parameter 6 
from the data in.Makinen et a/.l (12007.) using two standai'd estimators. 9,, — ((xi—Vj)^) CWehrhahnll 19751) and 9p — {F^^ — l)/2 
dOhta and Kimuraf 1973 ). Here Xi and yj are alleles of progeny i and j from mothers x and y, respectively, and F is the 



homozygosity (the probability that Xi — yj). It is known that 6y is unbiased [but has a large variance jZhivotovskv and Feldmanl 
1 19951) 1. whereas 9p is biased for large values of 9, but has a smaller variance. For the data in Tab. HIl we obtain 9y = 124 and 
9f = 49 when averaged over all five loci. While thes e estimates are uncer tain because of the small sample size, we see from 
Figs.|4]and|5]that the number of progeny sampled in dMakinen et g/.L 120071) (n^ — 21) is probably too low to reliably estimate 
the true number of fathers (the estimated Up/ f is the range 2 ... 4 for these data). Increasing the number of loci is not likely 
to help very much. The progeny sampled were chosen from large families (of 70 to 100 progeny). Our results indicate that in 
this case, the best strategy for increasing the accuracy of the number of fathers is to sample still more progeny. Indeed, analysis 
of the full set of progeny this data set indicates that the true number o f fathers in these families is significantly higher than the 
minimum number of fathers reported in Tab. HIldBostrom et al. Ll2009l in preparation). 
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function MinFathers(F, S, n) return integer is 
begin 

if 5 is empty and \F\ < h then 

n:=\F\ 

Print fathers in F 
else if |F| < n then 

pick a progeny p E S 

for each variant g of p loop 

for each vahd father /' € {Merge(/, q) : f G F} loop 
S" := {s G 5'\{p} : s cannot be inherited from /} 
ft := min(n,MinFathers(S", (-F\{/}) U {/'}. n)) 
end loop 

S' := {s € S'\{p} : s cannot be inherited from q} 
h := min(n,MinFathers(S", {F U {q}, n)) 
end loop 
end if 
return n 
end MinFalhers 



FIG. 1 Algorithm for finding the minimum number of fathers. S is the set of progeny, F is the set of fathers (F = initially), n is the 
minimum number of fathers for the whole set of progeny found so far ([|5|/2] initially). See the text for an explanation of the algorithm. 
Under which circumstances a loop over variants g of p is necessary is explained in the text. 
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FIG. 2 Shows a search tree for four progeny constructed by the algorithm described in Sec. |II] Each progeny has two loci, and at each locus 
the allele corresponding to the mother has been subtracted, so that each progeny is described by a list of two alleles (one for each locus): 
Pi = {a, b), p2 — (c, d), p3 = (c, e), and — (a, d). It is assumed that a, b, c, and d are four different allelic types. The figure illustrates 
how large parts of the search tree can be cut away (yellow) because they need not be visited. This may considerably speedup the algorithm. 
In the figure, each box represents the state of the algorithm at each iteration; F is the set of fathers (each father is represented by the list of 
offspring assigned to it), and S is the set of offspring the algorithm has yet to be assigned to a father. The states are visited from top to bottom, 
and from left to right, following the lines that emanate from the bottom of each node, except the terminal nodes (shown as squares). The 
algorithm stops when the set S is empty. A set of individuals that cannot inherit from a single father is shown in bold italic font (i.e. the father 
is invalid). The states which are coloured yellow are never visited, either because they descend from a state with an invalid father, or because 
it can be seen that the state does not lead to a solution with fewer fathers than the best state found so far in the search. 




FIG. 3 Shows a run of COLONY ( IWand.l2004h for the data shown in Tab. |l] Top: The time evolution of the log likelihood of the data. Bottom: 
The most likely number of sires as a function of the number of iterations (dots). Also shown is the exact lower bound provided by the new 
algorithm, MinFathers (dashed line). 
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FIG. 4 The probability psame that the number of fathers m is equal to the minimum number of fathers, as a function of the number of progeny 
per father, for three values of ^; ^ = 1 (diamonds), 9 = W (circles), and 6 = 100 (triangles). The actual number nc of fathers is 7, and there 
are three loci (L = 3). Each data point is based on 1000 families. 
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FIG. 5 Shows the probability that the most likely number of fathers is larger than the minimum number as a function of the mutation rate 6 for 
five different values of the number of loci L. Each data point is based on 1000 families with nt = 7 fathers and rip = 70 offspring (rip/t = 10). 



13 

Tables 



14 



# LI 

±L 
1 { 

2{ 
3{ 
4{ 
5 I 



L2 



L3 



L4 



L5 



6{ 
V { 

8 { 

9 { 
10 { 

^1 { 

12 { 

13 { 

14 { 

15 { 

{ 
{ 
{ 

19 { 

20 { 

21 { 

22 { 

23 { 

24 { 

25 { 

26 { 

27 { 

28 { 

29 { 

30 { 

31 { 

32 { 

33 { 

34 { 

35 { 

36 { 

37 { 

38 { 

39 { 

40 { 

41 { 

42 I 



11 
51 

51 
51 
51 
51 
51 
92 
51 
51 
51 
92 
73 
51 
51 
69 
92 
51 
92 
51 
73 
82 
51 
92 
51 
92 
51 
51 
51 
51 
92 
51 
92 
51 
51 
92 
92 
51 
51 
51 
92 
92 
92 



192} { 227. 



242 } { 225 . 



231 } { 217 . 



223 } { 199 . 



,204} 
211 } 
195 } 
181 } 
195 } 
192 } 
208 } 
173 } 

,208 } 
151 } 

,208 } 
192 } 

,201 } 
201 } 
192 } 

,201 } 
192 } 
201 } 
173 } 
192 } 
192 } 
204} 
204} 
181 } 
201 } 
208 } 
195 } 
208 } 
201 } 
217 } 
195 } 
206 } 
208 } 
204} 
194} 
192 } 
194} 
206} 
168 } 
195 } 
192 } 

,217 } 



{ 236. 
{ 227 
{ 227 
{ 236. 
{ 227 . 
{ 227 
{ 227 
{ 227 . 
{ 236. 
{ 227 
{ 227 
{ 227 . 
{ 236. 
{ 236. 
{ 227 
{ 236. 
{ 236. 
{ 227 
{ 227 
{ 236. 
{ 227 
{ 236; 
{ 227 . 
{ 227 . 
{ 227 
{ 236; 
{ 236. 
{ 239. 
{ 227 
{ 227 
{ 236. 
{ 236. 
{ 236., 
{ 227 
{ 230. 
{ 236. 
{ 227 . 
{ 227 
{ 227 
{ 227 . 
{ 227 . 
{ 236. 



242 } 
242 } 
236 } 
242 } 
230 } 
236 } 
236 } 
227 } 
242 } 
230 } 
239 } 
236 } 
242 } 
242 } 
227 } 
242 } 
242 } 
236 } 
242 } 
242 } 
236 } 
242 } 
236 } 
236 } 
236 } 
242 } 
242 } 
242 } 
242 } 
236 } 
242 } 
242 } 
242 } 
239 } 
242 } 
242 } 
230 } 
230 } 
236 } 
236 } 
236 } 
242 } 



{ 222 
{ 210 
{ 210. 
{ 213 
{ 210. 
{ 222 
{ 219 
{ 213 
{ 219 
{ 216 
{ 225 
{ 213 
{ 216 
{ 213 
{ 219 
{ 219 
{ 213 
{ 219 
{ 225 
{ 213 
{ 213 
{ 222 
{ 222 
{ 219 
{ 213 
{ 225 
{ 210. 
{ 225 
{ 210. 
{ 216 
{ 210. 
{ 219 
{ 219 
{ 210. 
{ 219 
{ 222 
{ 210. 
{ 219 
{ 222 
{ 216. 
{ 213 
{ 222 



231 } 
231 } 
225 } 
225 } 
225 } 
225 } 
231 } 
231 } 
231 } 
231 } 
225 } 
231 } 
231 } 
231 } 
231 } 
225 } 
225 } 
231 } 
231 } 
231 } 
225 } 
225 } 
225 } 
231 } 
231 } 
231 } 
225 } 
231 } 
231 } 
231 } 
225 } 
225 } 
231 } 
225 } 
225 } 
231 } 
225 } 
225 } 
225 } 
225 } 
231 } 
225 } 



{ 217 
{ 217 
{ 217 
{ 223 
{ 217 
{ 217 
{ 217 
{ 217 
{ 217 
{ 217 
{ 184 
{ 223 
{ 217 
{ 184 
{ 217 
{ 223 
{ 223 
{ 217 
{ 217 
{ 184 
{ 217 
{ 184 
{ 217 
{ 217 
{ 223 
{ 217 
{ 214 . 
{ 184. 
{ 217 
{ 214 . 
{ 217 
{ 169 
{ 217 
{ 217 
{ 217 
{ 217 
{ 202 
{ 217 
{ 217 
{ 217 
{ 184. 
{ 220. 



223 } 
223 } 
217 } 
223 } 
217 } 
217 } 
223 } 
232 } 
223 } 
223 } 
223 } 
226 } 
223 } 
223 } 
220 } 
223 } 
223 } 
235 } 
226 } 
223 } 
223 } 
217 } 
223 } 
217 } 
223 } 
226 } 
217 } 
223 } 
223 } 
223 } 
223 } 
217 } 
223 } 
217 } 
217 } 
223 } 
217 } 
217 } 
220 } 
223 } 
217 } 
223 } 



99 
99 
96 
99 
96 
84 
93 
93 
99 
84 
99 
93 
99 



{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 
{ 

{ 202 
{ 202 
{ 199 
{ 193 
{ 202 
{ 193 
{ 202 
{ 190 
{ 202 

{ 
{ 
{ 
{ 
{ 
{ 

{ 202 



99 
99 
99 
99 
84 
99 
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{ 184 
{ 196 
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{ 202 
{ 202 
99 
96 
99 
99 



202 } 
202} 
223 } 
202 } 
199 } 
202} 
199 } 
199 } 
199} 
199 } 
199 } 
223 } 
199 } 
202} 
223 } 
205 } 
199 } 
199 } 
202} 
199 } 
202 } 
202} 
202} 
199 } 
202 } 
199 } 
202} 
199 } 
199} 
223 } 
202 } 
202 } 
202 } 
205 } 
199 } 
202 } 
199 } 
202 } 
205 } 
205 } 
202 } 
202 } 
199 } 



TABLE I L. saxatilis. Multi-locus genotypes (five loci) of a mother (index 0) and n = 42 progeny from a clutch i 
preparation). 
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Female n GERUD 2.0 COLONY MinFathers 



rri 1 |T|" A 

Tab.|If 42 




13* 


12 


El-M^^ 21 




8 


7 


E7-M' 22 




9 


8 






Q 


Q 

o 


E4-Mt 21 




7 


7 


Sl-Mt 23 


4 


4 


4 


S5-Mt 23 




9 


8 


S2-Mt 23 


5 


5 


5 


S3-Mt 23 




10 


8 



TABLE II Comparison of the results (minimum or most likely number of sires, respectively) of three algorithms for different progeny arrays, 
n is the number of offspring in the sample. 'MinFathers' denotes the results of the algorithm described in section|lll in all cases all five loci 
are taken into account. Footnotes: *The symbol — denotes that the number of sires exceeded the ma ximum possible numb er for an exhaustive 
search in GERUD 2.0 . * Most likely number of fathers according to COLONY. ^Data taken from iMakinen et a/.Ll2007h . *Data taken from 
iBostrom et al. L [20091 in preparation). 
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father / new father common father /' 



any 


i *' 


*l 


any 


1* , 


{ a, 


*l 




{a, *} 


{a, 


*} 


{a, *} 


{a, *} 




*} 


{a, 6} 


{a, b} 


{a. 


*} 


{a, 6} 


{a, b} 


{h. 


*} 


{a, 6} 


{a, b} 


{c, 


*} 


no common father 



TABLE III The possible outcomes of merging a father / with a new father, resulting in the common father /'. Undetermined alleles are 
represented by an asterisk (*), and a, b, and c are three different allelic types. In the first row, the allele of the new progeny and thus the 
configuration of its father is undetermined, so that /' — f. In the final row, /' would have to contain three different alleles, which is 
impossible. Therefore, no common father exist in this case. 



